RANDOM MULTIPLICATIVE FUNCTIONS IN SHORT 

INTERVALS 



SOURAV CHATTERJEE AND KANNAN SOUNDARARAJAN 

Abstract. We consider random multiplicative functions taking 
the values ±1. Using Stein's method for normal approximation, 
we prove a central limit theorem for the sum of such multiplicative 
functions in appropriate short intervals. 



1. Introduction 

Many of the functions of interest to number theorists are multiphca- 
tive. That is they satisfy f{mn) = f{m)f{n) for all coprime natural 
numbers m and n. Some examples are the Mobius function fi{n), the 
function n** for a real number t, and Dirichlet characters x{^)- Often 
one is interested in the behavior of partial sums J2n<x /(^) such mul- 
tiplicative functions. For the proto-typical examples mentioned above 
it is a difficult problem to obtain a good understanding of such partial 
sums. A guiding principle that has emerged is that partial sums of spe- 
cific multiplicative functions (e.g. characters or the Mobius function) 
behave like partial sums of random multiplicative functions. By ran- 
dom we mean that the values of the multiplicative function at primes 
are chosen randomly, and the values at all natural numbers are built 
out of the values at primes by the multiplicative property. For example 
this viewpoint is explored in the context of finding large character sums 
in 0]. 

This raises the question of the distribution of partial sums of random 
multiplicative functions, and even this model problem appears difficult 
to resolve. The aim of this paper is to study the distribution of ran- 
dom multiplicative functions in short intervals [x, and in suitable 
ranges we shall establish that the sum of a random multiplicative func- 
tion in that range has an approximately Gaussian distribution. 

Throughout p will denote a prime number, and let X{p) denote in- 
dependent random variables taking the values +1 or —1 with equal 
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probability. Let X{n) = if n is divisible by the square of any prime, 
and if n = ■ ■ - pfc is square-free we define X{n) = 11^=1 ^(Pj)- Let 
M(x) = J2n<x i''^) ■ Wly Halasz showed that with probability 1 we 
have 

|M(x)| < ca;2exp ^(i(loglogxlogloglogx)2 

for some positive constants c (which may depend on random func- 
tion X) and d (an absolute constant), and forthcoming work of Lau, 
Tenenbaum and Wu [lOJ substantially improves upon this bound. Fur- 
thermore, Halasz showed that with positive probability the estimate 
M{x) > cxa exp(— (i(loglogxlogloglogx)2) holds infinitely often (for 
any d > 0), and this has been substantially improved in forthcom- 
ing work of Harper |7]. These results may be seen as approximations 
to the law of the iterated logarithm for sums of independent random 
variables. In related recent works Hough |H] and Harper [0] have con- 
sidered the distribution of Yl'n<x ^ i''^) ^ where the sum is restricted to 
integers having exactly k prime factors. Note that the central limit 
theorem covers the case k = 1 when we have a sum of independent 
random variables. When is a fixed positive integer, using the method 
of moments Hough established that such sums have a Gaussian dis- 
tribution. The work of Harper extends Hough's result and using the 
martingale central limit theorem he established that the Gaussian dis- 
tribution persists for k = o(loglogx), and fails for k of size a constant 
times log log X. Recall that most numbers n < x have about log log x 
prime factors, and so the dichotomy seen in Harper's result is quite in- 
teresting. Harper also showed by a conditioning argument that M{x) 
itself cannot have a normal distribution with mean and variance the 
number of square- free integers below x. 

Theorem 1.1. LetX denote a random multiplicative function as above. 
Let X and y be large natural numbers with y = 6x for some 6 < 1/10. 
Let S = S{x, y) denote the number of square-free integers in [x, x + y]. 
Let Z denote a Gaussian random variable with mean and variance 1, 
and let cj) denote a Lip schitz function satisfying \(j){a) — 0(/3)| < |« — /3| 
for all real numbers a and f3. Then we have that 




is bounded by a constant times 
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We recall that the Kantorovich-Wasserstein distance between two 
probability measures /i and v on the real line, denoted >V(/i, i^), is 
defined as the supremum of | / hd\i — J hdu\ over all Lipschitz func- 
tions h satisfying \h{a) — h{fi)\ < \a — (i\ for all real numbers a 
and /3. Thus our Theorem gives an estimate for the Kantorovich- 
Wasserstein distance between a normal distribution with mean zero 
and variance 1, and the distribution of sums of random multiplica- 
tive functions in short intervals. An intuitive way to assess the dis- 
tance between two probability measures is the Kolmogorov statistic: 
/C(/i, p) = sup^gjj I (i/i — du]. By a standard smoothing argu- 
ment, we shall show how our estimate for the Kantorovich-Wasserstein 
distance can be used to bound the Kolmogorov statistic. 

Corollary 1.2. With notations as in Theorem \l.l\ we have that 

e ''"/^dz 

x<n<.x+y 

is bounded by a constant times 



sup 



S' (logl/5)4 ^5'/ S^i^logy 

In an interval [x,x + y] we expect that there are about ~ ^y square- 
free integers. The work of Filaseta and Trifonov [2] shows that if 
X > y > Cx5 logx for some positive constant C then a positive propor- 
tion of the integers in [x, x + y] are square-free. The theorem in Filaseta 
and Trifonov only asserts the existence of a square-free integer in such 
an interval, but their proof plainly gives the stronger result above. 
Therefore for all short intervals with Cx'^ logx < y = o(a;/loga;), our 
Theorem shows that the distribution of J2x<n<x+y-^i''^) approxi- 
mately normal. Granville [3] has shown that the Ai?C-conjecture im- 
plies that the interval [x, x + y] contains a positive proportion of square- 
free integers if x'^ <C y < x for any e > 0; again Granville only stated 
the existence of one square-free integer in such intervals, but his proof 
gives the stronger assertion above. Thus, on the ASC-conjecture, for 
any short interval with x"^ <^ y = o{x/\ogx) our Theorem shows that 
the distribution of X]x<n<a+y "^('^) approximately normal. 

The proof of this result is based on a version of Stein's method for 
normal approximation developed in [1]. This involves calculating quan- 
tities related to the fourth moment of '^x<n<x+y-^i^)- '^^^ fourth 
moment itself is calculated in Proposition 3.1 below. If the interval 
[x, x + y] contains a positive proportion of square-free numbers, then 
Proposition 3.1 shows that the fourth moment is asymptotically the 
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fourth moment of a normal distribution provided y = o{x/ logx). Fur- 
ther, when y is of size a constant times x/\ogx, the argument there 
shows that the fourth moment does not match the fourth moment of a 
normal distribution. Thus it seems plausible that for x/ logx <^y < x 
the distribution of 'Yl,x<n<x+y-^^''^) '^^^ normal, but we do not have 
a proof of this assertion. By modifying the conditioning argument in 
Harper [5] we can establish that if ?/ is of a constant times x then the 
distribution of 'Yl,x<n<x+y-^i'''^) normal. 

The method developed here could also be used to study the distri- 
bution of ^^g^-^l'^) for other subsets S of square-free numbers in 
[1, x]. For example, we can obtain in this manner a different treatment 
of the results of Harper and Hough. Another example is the set of 
integers below x that are = a (mod q) where (a, g) = 1. If q/\ogx is 
large, and this arithmetic progression contains the expected number of 
square-free integers, the distribution should be normal analogously to 
Theorem 1.1. 

2. Beginning of the proof 

Let X, y and 5 be as in the statement of the Theorem, and let X 
denote a random multiplicative function as defined in the Introduction. 
We let z denote |log(l/5). We divide the primes below 2x into large 
(that is > z) and small (that is < z) primes. We denote the set of large 
primes by £, and the set of small primes by S. Let J-" be the sigma- 
algebra generated by X{p) for all p & S, and we denote the conditional 
expectation given J-" by E"^. 

Let Xc denote the vector {X{p))p^c- Then, given J-", we may think 
°f J2x<n<x+y-^i''^) ^ function of Xc, and we write this function as 
fiXc). " 

Lemma 2.1. With the above notations we have 

E^ifiXc)) = 0, 

and 

E^{f{Xcf) = S{x,y). 

Proof. Write a square-free number n G [x,x + y] as n^ra^ where us is 
the product of the primes in S that divide n, and uc the product of 
the primes in C that divide n. From our choice of z = |log(l/5) we 
note that us < Y[p<zP — follows that uc = n/us > 6x = y. 

From this we obtain that K-^{f{Xc)) = 0. Moreover, note that if n 
and n' are distinct square-free numbers in [x, x + y] then we must have 
nc 7^ n'^. Therefore we deduce that E-^(/(X£)^) = S{x,y), proving 
our Lemma. □ 



RANDOM MULTIPLICATIVE FUNCTIONS IN SHORT INTERVALS 5 

Let X'^ denote an independent copy of Xc- For each subset A of C 
we write X'^ to be the vector defined as X-^{p) = X{p) for p G C\A, 
and X-^{p) = X\p) for p & A. For a proper subset A of £, and a 
prime p E C\A we define 



{ph 



and 

Finally define 
T : 



AJ := f{Xc) - f{X^ 



Ap/-^ := f{Xi) - f{xf^'^) 



With these notations, and Lemma 2.1, Theorem 2.2 from fT] enables 
us to get the following result. 

Proposition 2.2. Let Z denote a random variable with a Gaussian dis- 
tribution with mean zero and variance 1. Let = ^ V , ^ , Xin), 
and let (f) denote a Lip schitz function satisfying \(j){a) — (f){[5)\ < \a — [5\ 
for all real numbers a and /3. We have 

pec 

Here conditioning on X means that we are conditioning on the 
whole vector {X{n))n>i- Actually, the bound given by Theorem 2.2 
from the paper [Ij has Var'^(E-^(T|l^)) in the first term instead of 
Var'^(E-^(T|X)). However, the latter quantity is at least as large as 
the former because E-^(T|l^) = E-^(E-^(T|X)|iy) and conditioning re- 
duces variance. 

We shall use Proposition 2.2 to estimate \E{(f){W)) -E{(p{Z))\. Note 
that this quantity is bounded by 

E|E-^(0(iy))-E(0(Z))| < iE(Var-^(E-^(T|X)))V2)+J_^E|A,/p. 

By the Cauchy-Schwarz inequality the first term above is 

< i(EVar-^(E-^(T|X))))^ < i (Var(E-^(T|X))) I 
We deduce that 

(1) \E{<P{W)) - E(0(Z)) I < i (Var(E-^(T|X))) ' + E ^l^^/l'" 

^"^^ pec 
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We will now focus on estimating the two terms in the RHS above. 
The second term will be estimated in the next section, and the first in 
Section 4. We now simplify the expression in the first term a little. 

For each p G C, let Af{p) denote all square- free numbers in the 
interval [x/p, {x + y)/p] which are coprime to p. Note that 

A,f = {X{p)-X'{p)) J2 ^(k), 

keJVip) 

and if p G C\A, 

\f^ = {X{p)-X'{p)) J2 X^ik), 

where X-^{k) is defined in the obvious way replacing X by X' on the 
primes in A. Therefore 




and since X(k) and X'^{i) do not depend on Xp, we see that 




= \^f\p)\+ E x{k)x{i), 

where M'^ip) denotes the set of all square- free integers in [x/p, {x+y) /p] 
that are not divisible p and by any prime q E A. Write the quantity T 
in Proposition 12.21 as 

P&C A<^C\{p} 



where 
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Thus, 

pec Acc\{p} 

= E E K^)iA/'-"(p)i 

P&C ACC\{p} 

P&C keAf{p) ieAf{p)\{k} "--^ ^' 

where ^^(n) denotes the number of distinct prime factors of n that are 
in C and the equahty above holds because 

AQC\{p} AQC\{{p}VJ{q\i}) 



fc=0 



The last step above involves a combinatorial identity and we leave the 
pleasure of proving it to the reader; a generalization of this identity 
appears as Problem B2 of the 1987 Putnam competition see [9]. Now 
we define 

E E ^^(W)- 

keAf(p)eeJV(p)\{k} ^' 
Then we may conclude that 

(2) Var(E-^(T | X)) = Var ^ . 

3. The fourth moment and a parametrization of solutions 
In this section we shall evaluate the fourth moment 

x<n<x+y 

for a suitable range of the variables x and y. The techniques involved 
in this calculation will be used in the proof of our main Theorem. 
When we expand out the fourth moment, we find that we are counting 
solutions to the equation 

nin2nsn4 = □ 

where rii, n2, n^, n^i are square-free integers with rij G [x^x + y] and □ 
denotes a perfect square. Recall that y = x6. We begin by parametriz- 
ing such solutions. 



8 SOURAV CHATTERJEE AND KANNAN SOUNDARARAJAN 

Write A = (ni,n2) and B = (11,3,11,4), and set iii = Anl, 112 = An2, 
Us — Bn\ and ^4 = Bn\. Then {n\, n'^ — (rig, n*^ — 1 and the equation 
77-177,2^377,4 = □ is equivalent to 77.*77.2 = 773774. Now write r = {n[, rig) and 
s = (772, 774). Then (r, s) = 1 and we see that 77^ = ru, = rv, ~ sv 
and 774 = su where u and v are natural numbers with {u, v) = 1. 

Summarizing the above paragraph, we see that the solutions to 
11,1112113114 = □ are parametrized by six variables A, B, r, s, u, v, 
with (r, s) — {u, v) — 1 and with 

ni — Aru, n2 — Asv, — Brv, — Bsu. 

There are additional coprimality conditions to ensure that these num- 
bers are square-free. Since (1 -|- 5)"'^ < 711772/(773774) < (1 + 5Y we see 
that 

{1 + 5)-' < A/B < (1 + 5). 
Similarly using 771773/(772774) = {r/sY' we have 

< !: < (1+5), 

s 

and finally using iiin^/ {112113) = v? /v"^ we get that 

{1 + 8)-' < u/v < {1 + 6). 

In what follows we shall make use of this parametrization and the 
above inequalities for the ratios A/B, r/s, u/v. One consequence of 
these inequalities is that \l A ^ B then A and B are both > 1/5. 
Similarly if r 7^ s then both r and s are > 1/5 and ii u ^ v then u and 
V are both > 1/5. 

Proposition 3.1. Call any solution to 771772773774 = □ where the vari- 
ables are equal in pairs a diagonal solution. The number of non- 
diagonal solutions to nin2n3n4 = □ with uj e [x,x{l + 5)] and nj 
square-free is at most 

80x^5^ {1 + 21oga;)(l + 25 log x). 

Therefore, with S denoting the number of square- free integers in [x, x{l+ 

x(l+S) ^ 

^[[Yl ^^'^O ) ^ ^S"^ + 0{x^5^{l + 5\ogx)\ogx). 

k=x 

Proof. Suppose A, B, r, s, u, v parametrize a non-diagonal solution to 
?T'i^2^3^4 = O. Then either one of 77 or w is not 1, or one of r or s is 
not 1; for if 77 = 7; = 1 and r = s = 1 then r7i = 772 and 773 = 774. Since 
these cases are symmetric we will only deal with the case when one of 
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M or f is not 1, and the total number of solutions is at most twice the 
number of solutions in this case. 

Suppose then that m or f is not 1, and since {u,v) = 1 this means 
that u V and so both u and v are > 1/S. Therefore it follows that 
Ar < xS{l + S). Further either A ^ B ot r ^ s, and so either A ot r 
must be > 1/6. Now suppose A and r are given with max{A,r) > 1/5 
and Ar < x6{l + 6). Since (1 + 6)-^ < A/B < (1 + 5) it follows that 
there are at most (1 + 2A6) choices for B. Similarly since (1 + < 
r/s < 1 + 6 there are at most 1 + 2r6 choices for s. Finally since 
Aru G [x, x(l + 6)] there are at most 1 + x6/{Ar) < 3x6/ (Ar) choices 
for u, and similarly there are at most 1 + x6/{Bs) < 3x6 /{Ar) choices 
for V. Thus the total number of such solutions is 

inax(A,r)>l/(5 
Ar<xS(l + S) 

This may be bounded by 

9 .9 1 + l + 2r6 

x5>A>l/S xS>r>l 

< 40x^53(1 + 2 logx)(l + 251ogx), 
proving our Proposition. □ 



When 5" is of size x6 (which holds if x6 ^ xs logx), Proposition 13.11 
shows that provided 6 = o(l/logx), the fourth moment matches the 
fourth moment of a Gaussian. 

We now use the ideas of this section to bound the term J2pec '^I'^pfl^^ 
arising in Proposition 2.2. By the Cauchy-Schwarz inequahty we have 

E|A,/|=^<(E|A,(/)r)^(E|A,(/)r)i 

As before let Af{p) denote the square-free integers in {x/p, {x + y)/p] 
which are not multiples of p. Then 

E|A,/|2 = 2 J2 l^K^ + D- 

Further we have 

E|A,/|^ = 8 1' 

ki k2 fc3 fc4 — n 

and arguing as in Proposition 3.1 we find that this is ^ (1 + y/p)"^ 
provided 6 < l/logx, where <^ means < up to a constant multiple. 
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Therefore we conclude that 

E|A,(/)|^«l+(pi 
Using this estimate for primes z < p < y we find that 



J2 E|A,/|3«yi5^-^« 

z<p<y z<p 



1 3 

1 ■ 

Z2 



li p > y then E|Ap/|^ = unless there happens to be a square-free 
multiple of p in [x, x + y] and in this case the expectation is 4. Such 
primes p must divide nx<n<2:+?/ n < {x + y)^ and there are at most 
y\og{x + y)/ logy possibilities for such primes p. We conclude that 



3 



(3) y^E|A,/|3«^ + ,;5i^. 

^2 logy 



4. Proof of the Theorem 

We now estimate Var(5^pg^ Tp) where we recall that Tp is defined in 
§2. This quantity equals 

eeMip)\{k} e'eM(q)\{k'} 

Above we allow for the possibility that p equals q. The expectation 
above is 1 exactly when kik'i' is a square and zero otherwise. Thus 
writing ni = kp, n2 = ip, = k'q, = i'q the quantity we seek is 

where Uj e [x,a;(l + 5)], ni ^ n2, n-^ 7^ ^4, the nj arc square-free 
with nin2n3n4 = □, and (ni, 72,2) and (713, n^) must contain at least one 
prime factor from L. 

We use the parametrization developed in §3 to estimate this. In the 
notation used there we find that our quantity above is 



(4) < E 



The sum above is over all ^4, r, s, u, v as in our parametrization 
with the further restraints that Aru ^ Asv and Brv ^ Bsu, and that 
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A and B must each contain at least one prime factor from C Our goal 
is to show that the above quantity is bounded by 

(5) o(^x^5\l + Slogx)(^^ + dlogx^y 

We will obtain this by first fixing A and r and analyzing the restraints 
on the other variables. 

Suppose first that A and r are chosen with Ar > 6{x + y). U u v 
then both u and v must be > 1/5 and then we would have Aru > x + y. 
Thus we must have u = v and since (w, f ) = 1 we have u = v = 1. Now 
r 7^ s (else ni = A = 77.2) and so wc have that both r and s are at least 
1/(5. Thus we have A x6 and Ar e [x,x + y]. Given r the condition 
(1 + S)~^ < r/s < {1 + 5) shows that there are <^ r6 choices for s. 
Similarly the inequality {1 + 6)~^ < A/B < (1 + 5) shows that given 
A there are <^l-\- A5 choices for B. Thus in this case our quantity is 

A-^x8 x/A<r<{x+y)/A A<x 



«: X- 



'(J^j^^ + ^logx). 



The final estimate follows because A must contain at least one prime 
factor from so that A> z and hence YIia ^ ^1 ^■ 

Now suppose that Ar < S{x + y). Recall that either r = s = 1 or 
that both r and s are at least 1/S. We consider these cases separately. 
In the former case, note that B has ^ 1 + AS choices, and u and v 
have at most xS/A choices each. Thus this case contributes 

< J2 (l + ^5)xV/A'<xV(- + 51ogx). 

A<S{x+y) ^ 

Now suppose that we have the second case when r > 1/S. Here there 
are <^ 1 + AS choices for B, and given r there are ^ rS choices for s. 
Finally there are <C xS/{Ar) choices for u and -C xS/{Bs) -C xS/{Ar) 
choices for v. Thus the contribution here is, 

«J2(^ + A5)r5^^ 

A.r 



< x^S^ ^(1 + 1 

A r 

<^ x'^S^ log x(^- + S log x^ . 
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Putting all these estimates together gives our bound ([5]). 

Using the bound (jSD, together with ([T]), ([2]) and we conclude that 
|E((/)(iy)) -E(0(Z))| is 

- — 7-7;t + -;j- + - 1 + ^logs ^ ( - — — + dlogx 

^SJ (log 1/(5) 2 ^2 logy S Vlogl/5 

To deduce the Theorem we combine the above bound with the following 
simple estimate for |E({/)(iy)) — E(0(Z))|. Since (p is Lipschitz we have 
\(f){t) - 0(0)1 < \t\, and so 

\E{<P{W)) - E(0(Z))| < |E(0(W^) - 0(O))| + |E((0(Z) - 0(O))| 

< E{\W\)+E{\Z\) < 2. 

5. Proof of the Corollary 

Let u denote a Gaussian distribution with mean and variance 1, and 
let n denote a probability measure. We claim that 



(6) /C(/i,j/) <2v/>V(/i,i/), 

and Corollary 1.2 follows as a special case of this estimate. 

For any real number t, and a parameter e > consider the function 
$+(^; t, e) defined by 

{€ if ^ G (-oo,t) 

if ^ > t + e. 

Note that (,^ ; t, e) is Lipschitz, and moreover (^; t, e) > ex(_oo,t) ) • 
Therefore 

r dfi<- f $+(-;t,e)d/i< i r $+(.;t,e)dz/ + M^ 

< / c/z/ + eH 1^1^. 

J~oo e 



Choosing e = ^yyV{fi, u) we obtain that 

J — oo J — oo 

An analogous argument, using a similar Lipschitz minorant of the 
characteristic function of {—oo,t), gives that 



f dfi> f du-2^W{fi, 

J—oo J—oo 



and so (l6l) follows. 
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